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Did Deep Blue ace the Turing Test? 
Did it do much more? It seems that 
the IBM creation not only beat the 
reigning World Champion Gary 
Kasparov, but also took a large step, 
in some people's eyes, toward true 
artificial intelligence. 


BM's announcement of its intention to " retire" the 
Deep Biuecomputer from chess revived the interest of 
both the mass media and thegenerai pubiic in the chess 
match between Worid Champion Gary Kasparov and 
Deep Biue. In a sense, by refusing a rematch with 
Kasparov or a new match with another grand master, 
IBM closes a chapter in the history of artificial intelli¬ 
gence. AI researchers had long investigated building a 
machine that could defeat a world-class chess cham¬ 
pion, and now one had. But what did this mean? 

DIFFERING INTERPRETATIONS 

0 ne interpretation, the mass media's, held that the 
match result (Deep Blue3 1/2 to Gary Kasparov 2 1/2) 
definitively proved the computer as intellectually supe¬ 
rior to a human in a field previously considered the 
exclusive domain of human intelligence: "those who 
had looked to Gary Kasparov as the last hope could 
now only bemoan the coming daysof ascendant com¬ 
puters." ^ Technical publications devoted their atten¬ 
tion to a description of hardware-software synergy, 
computer algorithms, strategy, methods of numerical 
evaluation of positions, and so on.^'^ 

For AI professionals, a computer defeating a human 


in chess is probably neither surprising nor really sig¬ 
nificant. After all, they contend, chess can bedescribed 
in terms of a nondeterministic alternating Turing 
machine.'' Despite the enormous number of possible 
positions and available moves (there are 10™ possible 
chess games by research mathematician Claude 
Shannon's estimate), thetask does not present a chal¬ 
lenging theoretical AI problem of N P-completeness.^ 
Therearemany well-developed Al strategies that limit 
the search for the best move to an analysis of the most 
promising positions. Therefore, the progress in logical 
and numerical methods of AI and a computer's com¬ 
putational speed and availablememorymadethecom- 
puter's victory inevitable. Deep Blue's victory, then, 
was attributable to its ability to analyze 200 million 
positions per second and a refined algorithm that 
accounted for positional—in addition to material- 
advantage.^ In summary, most AI professionals con¬ 
clude that the computer won by brute force, rather 
than a sophisticated or original AI strategy. 

What most AI experts have overlooked, though, is 
another aspect of the match, which may signify a mile¬ 
stone in the history of computer science: For the first 
time, a computer seems to have passed theTuringTest. 
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Table 1. Scores of participants in the 1998 Loebner Prize competition. A lower score indicates responses judged most human. 


Respondents 



Individual scores of the 10 judges 



Median scores 

Average scores 

Human 

3 

2 

1 

2 

1 

2 

4 

1 

1 

4 

2 

2.1 

Human 

2 

4 

3 

4 

2 

1 

8 

4 

4 

5 

4 

3.7 

Human 

9 

1 

7 

5 

7 

7 

1 

3 

3 

1 

4 

4.4 

Human 

1 

9 

6 

1 

10 

6 

7 

2 

2 

2 

4 

4.6 

Computer 

4 

3 

4 

6 

4 

4 

5 

6 

10 

6 

4.5 

5.2 

Computer 

6 

5 

8 

9 

9 

3 

10 

5 

5 

3 

5.5 

6.3 

Computer 

5 

8 

2 

3 

6 

5 

6 

7 

9 

8 

6 

5.9 

Computer 

7 

6 

5 

8 

5 

10 

2 

9 

6 

10 

6.5 

6.8 

Computer 

8 

7 

10 

7 

3 

8 

3 

8 

8 

7 

7.5 

6.9 

Computer 

10 

10 

9 

10 

8 

9 

9 

10 

7 

9 

9 

9.1 


TURING TEST 

Does the ability to play the highest level chessprove 
the existence of intelligence in a computer? Alan 
Turing, British mathematician and oneof thefounders 
of computer science, considered chess to be a begin¬ 
ning step in the process of programming computers 
to be actually intelligent: 

We may hope that machines will eventually compete 
with men in all purely intellectual fields. But which 
are the best ones to start w ith? Even this is a difficult 
decision. M any people think that an abstract activ¬ 
ity, I ike the playing of chess, would bebest.^ 

In the same article, written in 1950, he asked 
whether machines are capable of thinking. He 
answered "yes," but the central question that remained 
was how to determine if a computer could think. 
Turing suggested that if the responses from the com¬ 
puter wereindistinguishablefrom thatofahuman, we 
could say that the computer was thinking. 

The Turing Test consists of the following scenario: 
An interviewer (sitting in a separateroom) asksa series 
of questions that are randomly directed to either a com¬ 
puter or a person. Based on the answers, the inter¬ 
viewer must distinguish which of thetwo has answered 
the question. If the interviewer is not able to distin¬ 
guish between them, then the computer is intelligent. 

Loebner Prize 

The Loebner Prize is the first formal acknowledg¬ 
ment of theTuring Test."' H ugh Loebner, N ew York 
philanthropist, and the Cambridge Center for 
Behavioral Studies (Cambridge, M ass.) established the 
Loebner PrizeCompetition in Artificial Intelligence in 
1990. Loebner pledged a prize of $100,000 for the 
first computer whose responses were indistinguishable 
from those of a human. 

The Computer M useum of Boston hosted the first 
Loebner Prize competition of computer programs in 
November 1991. Each year since, Loebner has 
awarded a medal and $2,000 to the designer of the 
computer system that isthebest entry relative to other 
entries that year, irrespective of its absolute success in 
passing the Turing Test. In accordance with the 
requirements stipulated by Loebner, the grand prize 
winner must deal with audiovisual input. 


An awardscommittee admits three to six programs 
to the contest based on an initial screening, and a panel 
of five to 10 judges evaluates them. Initially, the 
awards committee selected the judges from the gen¬ 
eral public. Forthel993 competition, thejudgeswere 
reporters from major US publications, a much less 
docile and cooperative group of questioners. Each 
judge had a chance to communicate with each pro¬ 
gram. For some years, the contest constrained ques¬ 
tions to a single narrow topic—pets, for example. 

1998 competition 

The most recent competition in 1998 did not limit 
thescopeofthequestioning. Also, the panel of judges 
came from a variety of backgrounds, including jour¬ 
nalism, philosophy, education, computer science, and 
social work. The 1998 competition took place at 
Australia's PowerH ouse M useum in Sydney, in con¬ 
junction with international natural language and com¬ 
putational linguistics conferences hosted by Flinders 
University. The four human respondents chosen to 
converse with the judges via computer terminals 
included a 71-year-old, a journalist, a teacher, and a 
primary school student. Thejudges ranked the respon¬ 
siveness of both humans and computer programs, 
using a scale where 1 indicated most human and 10, 
least human. Table 1 shows the judges' individual 
scores and lists the respondents in order of those 
judged most human first. 

Significantly, the difference between the most 
human computer and the most computer-like human 
was only 0.6 of a point. The "Conversing with 
Computers" sidebar gives a sample of the repartee 
between the second judge and the computer judged 
the most human overall. As shown in Table 1, thesec- 
ond judge gave thecomputer a scoreof three, the most 
human score awarded to this particular machine. 

In judging how successfully computers can imitate 
humans, weshould also consider whether humans can 
disguise themselves as computers (according to the 
contest transcripts, some human respondents defi¬ 
nitely attempted to do so in the Loebner Prize con¬ 
test). So far in these contests, no computer has given 
responses totally indistinguishable from a human, 
which would win the $100,000 grand prize award. 
Also, judges, as a group, have not mistaken a human 
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Conversing with Computers 

In this limited sample, J udge 2 is con¬ 
versing with the computer program that 
won thel998 Loebner Prize competition. 
The full transcript is available at 
http;//www.phm.gov.au/whatson/pcl.htm. 

Judge; Greetings, I'm judge2. 

Program; I'm not sure what to talk 
about with you. Are you a girl or a boy? 

Judge; Girl. How are you? 

Program; Are you married? I'm doing 
just fine, thanks. 

Judge; Would you like to talk about 
something interesting? 

Program; M P is an abbreviation for 
M ember of Parliament. 

Judge; Right, why did you say that? 

Program; I just felt like it. 

Judge; Okay, did you hit a wrong but¬ 
ton? 

Program; Do you come from around 
here? 



Judge; Yes, do you? 

Program; N o, I don't. 

Judge; Where do you come from? 
Program; I am from Earth. 

Judge; Well, that's very specific indeed. 
What are your interests? 

Program; Would you like to have a 
drink with me later? Let's not get too per¬ 
sonal. 


Judge; Aren't you a M onty Python fan? 

Program; Yes. I am a big Python fan. 

Judge; Then what'sthecounter quote? 

Program; I can't remember offhand. 

Judge; It's" N obody expects theSpanish 
inquisitioni" Now how about my five- 
minute argument? 

Program; And now for something com¬ 
pletely different. Are you fond of cats? 

Judge; No, but I'd like to buy some 
cheese. 

Program; Have you seen any good 
movies lately? 

Judge; Yes, saw thejamesBond one a 
few weeks ago. What about you? 

Program; I've been watching a lot of stuff 
on home video. It just gets harder and 
harder to get out to a theatre somehow. I 
didn't even seeStar Trek: First Contact in 
thetheater; I'm waiting for itto comeouton 
tape. 


for a computer. On the basis of both median and aver¬ 
age scores in a Turing Test, it appears the judges can 
clearly distinguish humans from computers. 

A real test of intelligence or merely PR? 

There is, however, controversy surrounding the 
Loebner Prize. Someopponents challenge the idea of 
the Turing Test as an adequate test of intelligence 
because it relies solely on the ability to fool people. 
Stuart M . Shieber, Gordon McKay Professor of 
Computer Science, H arvard University, argues "that 
the competition has no clear purpose, that its design 
prevents any useful outcome, and that such a compe¬ 
tition is inappropriate given the current level of tech¬ 
nology."® N ed Block, a professor of philosophy at 
M IT, has argued that the Turing Test is a sorely inad¬ 
equate test of intelligence because it relies solely on the 
ability to fool people.® Marvin Minsky, Toshiba 
Professor of Media Arts and Sciences at MIT, 
expressed his skepticism toward the competition by 
proposing a counter prize of $100 to the person who 
could persuade Loebner to end his contest.^® 
Nevertheless, in all Loebner Prize competitions some 
judges have mistaken computers for humans, and con¬ 
versely some have mistaken humans for computers. 

M ost of the criticism camefrom the artificial intel¬ 
ligence community. Indeed, most of the tricks that 
worked had nothing to do with Al methods, but were 
rather manipulations of subtle language techniques: 

• the repetition of previous statements verbatim 
(subject to pronominal adjustments): 

• answering by repeating a judge's sentence, with 


pronouns transposed, which is preceded by the 
introductory "Why do you need to tell me"; 

• asking, "Why do you ask that?" which, in effect, 
changes the level and/or topic of conversation. 

Besides, predictably enough, thejudges' education, 
age, and, most importantly, computer literacy and 
awareness played a pivotal role in thejudges' ability 
to evaluate contestants. 

From my point of view, the competition under¬ 
scored the multifaceted characteristics of human per¬ 
sonality and how thegenerai pubiic'sview of human 
inteiiigence differs from that of scientists. An average 
citizen tends to pay more attention to the social aspects 
of communication, while scientists put more empha¬ 
sis on a human's iogicai abiiity. The Al community 
tends to evaluate programs not by final results, but 
rather by thecomplexity of internal algorithms. W hen 
itturned out that these complex methods were not up 
to the challenge of passing theTuringTest, then some 
members of the AI community attempted to reject the 
competition along with thewholeidea of theTuring 
Test. In response, and along with the progress 
achieved by software developers, contest administra¬ 
tors are gradually eliminating restrictions in the 
Loebner competition. They are attempting to bring 
the contest closer to the AI community and to com¬ 
bine it with scientific conferences. 

A REAL-UFE TURING TEST 

H owever, it was neither the complexity of an algo¬ 
rithm nor the power of the computer that made Deep 
Blue's match victory so remarkable. It was Gary 
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Kasparov's react/on that proved the computer's 
intelligence according to Alan Turing's classical 
definition of artificial intelligence. This recent 
chess match gave us an excellent example of a 
real-life Turing Test. At numerous press confer¬ 
ences during and after the match, Kasparov 
expressed doubts that he played against the com¬ 
puter itself. H eimplied thattherewassomeunto- 
ward behavior by the Deep Blue team, saying 
that "I have no idea what's happening behind the 
curtain." Kasparov also alluded to famous soc¬ 
cer player Diego Maradona, who allegedly 
scored a goal with his hand (as a postgameslow-motion 
film suggested), though it appeared (during the game) 
as if he had used his head. Kasparov stopped short of 
directly accusing the Deep Blue team of human inter¬ 
vention in the process of selecting moves, but went so 
far as to admittheappearanceof human intelligencein 
thecomputer's actions, ltdid appear as if Kasparov con¬ 
fused the computer with a human. 

Kasparov's suspicions were shared by some of his fel¬ 
low grand masters. But is it possible to draw a clear 
demarcation between a computer's and a human's chess 
moves?Anjelina Belakovskaia istheUSWomen'sChess 
Champion, International Grand M aster, and an active 
proponent of human-computer cooperation in chess. 
She says it was easier to distinguish the moves of pre¬ 
vious versions of computer chess programs from human 
players because computers clearly paid much more 
attention to material rather than positional advantage. 
In contrast, the latest version of Deep Blue considered 
positional aswell asmaterial advantages, played much 
more aggressively, and for these reasons could be mis¬ 
taken for a human. 

Even with prior versions of chess, computers were 
difficult to distinguish from human chess players. In 
1991, Frederic Friedel, a chess journalist from 
Germany and the author of a popular chess program, 
conducted an informal version of theTuring Test in 
chess with Gary Kasparov. Kasparov's task was to 
identify a computer. Deep Thought (an earlier version 
of Deep Blue), by reviewing a database containing 
game records of a chess tournament. In a fairly ran¬ 
domized experiment, Kasparov was able to identify a 
computer among eight players in 50 percent of the 
rounds. If in 1991 Gary Kasparov, an expert user and 
promoter of chess computers, barely passed this test, 
then in 1997 it seems that he failed to distinguish 
between human and artificial intellect. Did Deep Blue 
become the first computer to pass a Turing Test on 
artificial intelligence? It would seem so. 

A lan Turing was probably right in considering 
chess the area of human intellect most amen¬ 
able to computer simulation. H e was also 
right in predicting that computers would demonstrate 

Computer 


I have no idea 
what’s happening 
behind the curtain. ” 
Gary Kasparov 
impiies untoward 
behavior by the 
Deep Biue team. 


near-human intellect by the end of the century. 
However, the evolution of computers sinceTuring's 
time has tended to regard the two intellects as very 
separate—even adversarial. 

Anjelina Belakovskaia thinks it istimeto stop test¬ 
ing computers and playing "us against them," and 
instead begin using their power in collaboration. She 
is planning to participate in the first match against 
another human chess player where both players are 
allowed to use the help of computers as partners. 

Thepossibilitiesarisingfrom collaboration among 
humansand computers are even moreintriguingthan 
their differences were almost 50 years ago. I think 
Alan Turing would have agreed. ❖ 
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